Hierarchical Model-Based Clustering for Relational Data

نویسندگان

  • Jianzhong Chen
  • Mary Shapcott
  • Sally I. McClean
  • Kenneth Adamson
چکیده

Relational data mining deals with datasets containing multiple types of objects and relationships that are presented in relational formats, e.g. relational databases that have multiple tables. This paper proposes a propositional hierarchical model-based method for clustering relational data. We first define an object-relational star schema to model composite objects, and present a method of flattening composite objects into aggregate objects by introducing a new type of aggregates – frequency aggregate, which can be used to record not only the observed values but also the distribution of the values of an attribute. A hierarchical agglomerative clustering algorithm with log-likelihood distance is then applied to cluster the aggregated data tentatively. After stopping at a coarse estimate of the number of clusters, a mixture model-based method with the EM algorithm is developed to perform a further relocation clustering, in which Bayes Information Criterion is used to determine the optimal number of clusters. Finally we evaluate our approach on a real-world dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HIERARCHICAL DATA CLUSTERING MODEL FOR ANALYZING PASSENGERS’ TRIP IN HIGHWAYS

One of the most important issues in urban planning is developing sustainable public transportation. The basic condition for this purpose is analyzing current condition especially based on data. Data mining is a set of new techniques that are beyond statistical data analyzing. Clustering techniques is a subset of it that one of it’s techniques used for analyzing passengers’ trip. The result of...

متن کامل

A Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection

Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decisi...

متن کامل

Learning Multiple Hierarchical Relational Clusterings

Three important generalizations of the basic clustering problem are relational, hierarchical, and multiple clustering. This paper proposes the first approach to clustering that unifies all three. We describe a general probabilistic model for relational clustering, and show that flat, hierarchical and multiple relational clustering models are special cases. This paper also describes an efficient...

متن کامل

Dynamic Hierarchical Clustering

We describe a new method for dynamically clustering hierarchical data which maintains good clustering in the presence of insertions and deletions. This method, which we call Enc, encodes the insertion order of children with respect to their parents and concatenates the insertion numbers to form a compact key for the data. We compare Enc with some more traditional approaches and show in what cir...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004